The Form is the Substance: Classification of Genres in Text
نویسندگان
چکیده
Categorization of text in IR has traditionally focused on topic. As use of the Internet and e−mail increases, categorization has become a key area of research as users demand methods of prioritizing documents. This work investigates text classification by format style, i.e. "genre", and demonstrates, by complementing topic classification, that it can significantly improve retrieval of information. The paper compares use of presentation features to word features, and the combination thereof, using Naïve Bayes, C4.5 and SVM classifiers. Results show use of combined feature sets with SVM yields 92% classification accuracy in sorting seven genres.
منابع مشابه
Importance and Position of Form and Writing Style in Philosophizing
Philosophical texts have emerged in diverse genres and literary forms. Thinkers take different stands about importance of elements of these works. Some of them consider ornamental and accidental role for literary forms and the others in contrast consider philosophical implication for these elements. Philosophers’ approach to the role of literary elements of philosophical text is influential on ...
متن کاملThe Relationship between Iranian EFL Learners' Reading Comprehension, Vocabulary Size and Lexical Coverage of the Text: The Case of Narrative and Argumentative Genres
This study explored the relationship between EFL learners’ vocabulary size, lexical coverage of the text and reading comprehension texts (narrative & argumentative genres). To this end, 120 male and female out of 180 students studying at Talesh Azad University were selected based on their performance on the Nelson Proficiency Test. A Nelson reading proficiency test was also administered in orde...
متن کاملشناسایی خودکار سبک موسیقی
Nowadays, automatic analysis of music signals has gained a considerable importance due to the growing amount of music data found on the Web. Music genre classification is one of the interesting research areas in music information retrieval systems. In this paper several techniques were implemented and evaluated for music genre classification including feature extraction, feature selection and m...
متن کاملارتقای کیفیت دستهبندی متون با استفاده از کمیته دستهبند دو سطحی
Nowadays, the automated text classification has witnessed special importance due to the increasing availability of documents in digital form and ensuing need to organize them. Although this problem is in the Information Retrieval (IR) field, the dominant approach is based on machine learning techniques. Approaches based on classifier committees have shown a better performance than the others. I...
متن کاملAn Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification
Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001